Red Wine Quality

by Andy Herzberg

==========================================================================================================

Introduction

The goal of this explorative analysis is to investigate which chemical properties influence the quality of red wines. The data set contains 1,599 red wines with 11 variables on their chemical properties. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent). The importance and interplay between the chemical compounds will be investigated regarding the experts’ rating.

Univariate Plots Section

Data set sample

The following table display the top rows of the dataframe.

##   X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1           7.4             0.70        0.00            1.9     0.076
## 2 2           7.8             0.88        0.00            2.6     0.098
## 3 3           7.8             0.76        0.04            2.3     0.092
## 4 4          11.2             0.28        0.56            1.9     0.075
## 5 5           7.4             0.70        0.00            1.9     0.076
## 6 6           7.4             0.66        0.00            1.8     0.075
##   free.sulfur.dioxide total.sulfur.dioxide density   pH sulphates alcohol
## 1                  11                   34  0.9978 3.51      0.56     9.4
## 2                  25                   67  0.9968 3.20      0.68     9.8
## 3                  15                   54  0.9970 3.26      0.65     9.8
## 4                  17                   60  0.9980 3.16      0.58     9.8
## 5                  11                   34  0.9978 3.51      0.56     9.4
## 6                  13                   40  0.9978 3.51      0.56     9.4
##   quality
## 1       5
## 2       5
## 3       5
## 4       6
## 5       5
## 6       5

Learning: In the data set is an index column “X” that will be removed from further inspections.

Data types of the set

The following table displays the column name and the resp. data type.

##        fixed.acidity     volatile.acidity          citric.acid 
##            "numeric"            "numeric"            "numeric" 
##       residual.sugar            chlorides  free.sulfur.dioxide 
##            "numeric"            "numeric"            "numeric" 
## total.sulfur.dioxide              density                   pH 
##            "numeric"            "numeric"            "numeric" 
##            sulphates              alcohol              quality 
##            "numeric"            "numeric"            "integer"

Learning: There are just numeric columns in the data set. Most are floats, some are integers. All columns except the quality column contain continuous data.

Summary statistics, histograms + box plots

The following section provides an overview about the distribution of the features.

Distribution of fixed.acidity

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90

The distribution of fixed.acidity is right skewed. There are some outliers with a fixed.acidity higher than approx \(13g/dm^3\).

Distribution of volatile.acidity

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800

The distribution of volatile.acidity is a bit right skewed with some outliers with volatile.acidity higher than \(1.0g/dm^3\). Mean and Median have nearly the same value.

Distribution of citric.acid

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000

The distribution of citric.acid is right skewed. The values of Mean and Median are close together.

Distribution of residual.sugar

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500

The distribution of residual.sugar is right skewed with a lot of outliers bigger than approx \(3.5g/dm^3\).

Distribution of chlorides

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100

The distribution of chlorides is right skewed with a lot of outliers bigger than approx. \(0.15g/dm^3\).

Distribution of free.sulfur.dioxide

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00

The distribution of free.sulfur.dioxide is right skewed. There are several outliers with more than approx \(42mg/dm^3\).

Distribution of total.sulfur.dioxide

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00

The distribution of total.sulfur.dioxide is right skewed. There are sveral outliers with more than \(120mg/dm^3\).

Distribution of density

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0040

Accept for some outliers the distribution from density is normally distributed. Therefore Mean and Median are very close to be the same.

Distribution of pH

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010

The distribution of the pH is also normally distributed.

Distribution of sulphates

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000

The distribution of sulphates is right skewed with some outliers with more than \(1.0g/dm3\).

Distribution of alcohol

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

The distribution of alcohol is also right skewed. 50% of the wines have an alcohol concentration between 9.50% up to 11.10% by volume.

Distribution of experts' quality rating


Most experts' rating range between 3 and 8. About 1,300 wines were rated between 5 or 6 regarding their quality. The mode is a at a quality of 5 with 681 ratings.

# Display quantities of discrete experts\' quality rating values
table(df$quality)
## 
##   3   4   5   6   7   8 
##  10  53 681 638 199  18
# Detect maximum of quality ratings
which.max(table(df$quality))
## 5 
## 3

Facet wrap investigations


Based on the data we can say that there is no visible relationsship between fixed.acidity and quality. First it seems that the quality rises with higer fixed.acidity, but after rating 7 it descreases.

## Difference of features' means from lowest and highest rating:  0.2066667


Based on the data we can say that on average the missing of volatile.acidity seems the have a negative impact on the experts quality ratings.

## Difference of features' means from lowest and highest rating:  -0.4611667


Based on the data we can say that on average the presence of citric.acid results in a higher quality rating.

## Difference of features' means from lowest and highest rating:  0.2201111


Based on the data on average it doesn’t look like residual.sugar has a big impact on the experts' quality ratings.

## Difference of features' means from lowest and highest rating:  -0.05722222


Based on the data it seems that chlorides have little influence on quality. When the value of chlorides is less, the quality decrease slightly.

## Difference of features' means from lowest and highest rating:  -0.05405556


Based on the data we can say that on average high concentrations of free.sulfur.dioxide result in medium quality ratings while low concentration lead either to poor or good ratings.

## Difference of features' means from lowest and highest rating:  2.277778


As for total.sulfur.dioxide, on average high concentrations of free.sulfur.dioxide result in medium quality ratings while low concentration lead either to poor or good ratings.

## Difference of features' means from lowest and highest rating:  8.544444


Based on the data we can say that on average lower density results in better quality ratings.

## Difference of features' means from lowest and highest rating:  -0.002251778


Based on the data we can say that on average there is a relationship between pH value and quality. The lower the pH the higher the experts' quality rating.

## Difference of features' means from lowest and highest rating:  -0.1307778


Based on the data we can say that on average the higher the sulphates concentration the better the quality ratings.

## Difference of features' means from lowest and highest rating:  0.1977778


Based on the data we can say that on average the higher the alcohol comcentration the better the experts' quality ratings.

## Difference of features' means from lowest and highest rating:  2.139444

Univariate Analysis

What is the structure of your dataset?

The dataset contains 1,599 rows. Each row represents a red wine observation that consists of 11 coninuous meassurements of chemical properties and discrete experts’ rating on the red wine quality. There are no missing values in the data set. Several of the attributes may be correlated according to some information attached to the data set, according to the naming of the variables (like f. e. free.sulfur.dioxide and total.sulfur.dioxide) and according to the behaviour of the features box plots when it comes to facet grid exploration.

What is/are the main feature(s) of interest in your dataset?

Inspecting the facetted box plots visually the following features seem to have an effect on the experts's quality ratings:
* fixed.acidity (positive effect)
* volatile.acidity (negative effect)
* citric.acid (positve effect)
* chlorides (negative effect)
* free.sulfur.dioxide (positive/negative effect)
* total.sulfur.dioxide (positive/negative effect)
* density (negative effect)
* pH (negative effect)
* sulphates (positive effect)
* pH (positive effect)

When comparing the features' means for good and bad ratings the following features seem to be most important for the experts' rating (from highest to lowest difference):
* total.sulfur.dioxide: 8,544
* free.sulfur.dioxide: 2,278
* alcohol: 2,139
* volatile.acidit: -0,461
* citric.acid: 0,220
* fixed.acidity: 0,207
* sulphates: 0,198
* pH: -0,131
* residual.sugar: -0,057
* chlorides: -0,054
* density: -0,002

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

The interplay between the features will help support the investigation in the next bivariate and multivariate step. I could imagine that the features with the biggest delta between good and badf rating play the biggest role for the experts' rating.

Did you create any new variables from existing variables in the dataset?

In didn’t create any new variables on my own but I did a facet grid exploration and a mean comparison of good and bad ratings.

Of the features you investigated, were there any unusual distributions?

The are some right skewed distributions for several features. When inspecting the features using facet grid it turned out the especially the best and worst ratings were skewed.

Bivariate Plots Section

Correlation analysis

The following table displays the Pearson correlation coefficient r on features basis. A correlation coefficient of ±0.5 indicates a strong correlation, ±0.3 indictes medium correlation while ±0.1 indicates a weak correlation.

##                      fixed.acidity volatile.acidity citric.acid
## fixed.acidity                  1.0             -0.3         0.7
## volatile.acidity              -0.3              1.0        -0.6
## citric.acid                    0.7             -0.6         1.0
## residual.sugar                 0.1              0.0         0.1
## chlorides                      0.1              0.1         0.2
## free.sulfur.dioxide           -0.2              0.0        -0.1
## total.sulfur.dioxide          -0.1              0.1         0.0
## density                        0.7              0.0         0.4
## pH                            -0.7              0.2        -0.5
## sulphates                      0.2             -0.3         0.3
## alcohol                       -0.1             -0.2         0.1
## quality                        0.1             -0.4         0.2
##                      residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity                   0.1       0.1                -0.2
## volatile.acidity                0.0       0.1                 0.0
## citric.acid                     0.1       0.2                -0.1
## residual.sugar                  1.0       0.1                 0.2
## chlorides                       0.1       1.0                 0.0
## free.sulfur.dioxide             0.2       0.0                 1.0
## total.sulfur.dioxide            0.2       0.0                 0.7
## density                         0.4       0.2                 0.0
## pH                             -0.1      -0.3                 0.1
## sulphates                       0.0       0.4                 0.1
## alcohol                         0.0      -0.2                -0.1
## quality                         0.0      -0.1                -0.1
##                      total.sulfur.dioxide density   pH sulphates alcohol
## fixed.acidity                        -0.1     0.7 -0.7       0.2    -0.1
## volatile.acidity                      0.1     0.0  0.2      -0.3    -0.2
## citric.acid                           0.0     0.4 -0.5       0.3     0.1
## residual.sugar                        0.2     0.4 -0.1       0.0     0.0
## chlorides                             0.0     0.2 -0.3       0.4    -0.2
## free.sulfur.dioxide                   0.7     0.0  0.1       0.1    -0.1
## total.sulfur.dioxide                  1.0     0.1 -0.1       0.0    -0.2
## density                               0.1     1.0 -0.3       0.1    -0.5
## pH                                   -0.1    -0.3  1.0      -0.2     0.2
## sulphates                             0.0     0.1 -0.2       1.0     0.1
## alcohol                              -0.2    -0.5  0.2       0.1     1.0
## quality                              -0.2    -0.2 -0.1       0.3     0.5
##                      quality
## fixed.acidity            0.1
## volatile.acidity        -0.4
## citric.acid              0.2
## residual.sugar           0.0
## chlorides               -0.1
## free.sulfur.dioxide     -0.1
## total.sulfur.dioxide    -0.2
## density                 -0.2
## pH                      -0.1
## sulphates                0.3
## alcohol                  0.5
## quality                  1.0

The p-value describes the probability of the correlation coefficient that the correlation is significant. A p-value greater of 0.05 means that the correlation is not significant, less than 0.05 means it is significant.

##                      fixed.acidity volatile.acidity citric.acid
## fixed.acidity              0.00000          0.00000     0.00000
## volatile.acidity           0.00000          0.00000     0.00000
## citric.acid                0.00000          0.00000     0.00000
## residual.sugar             0.00000          0.93892     0.00000
## chlorides                  0.00018          0.01422     0.00000
## free.sulfur.dioxide        0.00000          0.67470     0.01474
## total.sulfur.dioxide       0.00001          0.00221     0.15555
## density                    0.00000          0.37876     0.00000
## pH                         0.00000          0.00000     0.00000
## sulphates                  0.00000          0.00000     0.00000
## alcohol                    0.01365          0.00000     0.00001
## quality                    0.00000          0.00000     0.00000
##                      residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity               0.00000   0.00018             0.00000
## volatile.acidity            0.93892   0.01422             0.67470
## citric.acid                 0.00000   0.00000             0.01474
## residual.sugar              0.00000   0.02617             0.00000
## chlorides                   0.02617   0.00000             0.82412
## free.sulfur.dioxide         0.00000   0.82412             0.00000
## total.sulfur.dioxide        0.00000   0.05809             0.00000
## density                     0.00000   0.00000             0.38050
## pH                          0.00061   0.00000             0.00487
## sulphates                   0.82521   0.00000             0.03888
## alcohol                     0.09258   0.00000             0.00549
## quality                     0.58322   0.00000             0.04283
##                      total.sulfur.dioxide density      pH sulphates
## fixed.acidity                     0.00001 0.00000 0.00000   0.00000
## volatile.acidity                  0.00221 0.37876 0.00000   0.00000
## citric.acid                       0.15555 0.00000 0.00000   0.00000
## residual.sugar                    0.00000 0.00000 0.00061   0.82521
## chlorides                         0.05809 0.00000 0.00000   0.00000
## free.sulfur.dioxide               0.00000 0.38050 0.00487   0.03888
## total.sulfur.dioxide              0.00000 0.00435 0.00782   0.08602
## density                           0.00435 0.00000 0.00000   0.00000
## pH                                0.00782 0.00000 0.00000   0.00000
## sulphates                         0.08602 0.00000 0.00000   0.00000
## alcohol                           0.00000 0.00000 0.00000   0.00018
## quality                           0.00000 0.00000 0.02096   0.00000
##                      alcohol quality
## fixed.acidity        0.01365 0.00000
## volatile.acidity     0.00000 0.00000
## citric.acid          0.00001 0.00000
## residual.sugar       0.09258 0.58322
## chlorides            0.00000 0.00000
## free.sulfur.dioxide  0.00549 0.04283
## total.sulfur.dioxide 0.00000 0.00000
## density              0.00000 0.00000
## pH                   0.00000 0.02096
## sulphates            0.00018 0.00000
## alcohol              0.00000 0.00000
## quality              0.00000 0.00000

The following Correlogram visualizes the correlation between the features. If the p-value is not significant the coefficient of correlation is set with 0.


The correlogram illustrates the relationships beween features itself and between features and quality. The values show the pearson correlation coefficient. The value of ±0.5 indicates a strong correlation. The value of ±0.3 stands for a medium correlation and ±0.1 indicates a weak correlation. We can see the alcohol concentration is correlated with quality. There is also a negative correlation of volatile.acidity with quality. We can also see that some features are correlated with others features like free.sulfur.dioxide and total.sulfur.dioxide.

Scatter plots of the strongest correlations


As seen from the correlogram free.sulfur.dioxide and total.sulfur.dioxide are hightly correlated with a Pearson correlation coefficient of 0.7. The positiv correlation shows, that with increasing free.sulfur.dioxide the total.sulfur.dioxide rises, too. This can also be seen in this scatter plot. The red dotted line visualizes the regression line for the data.


As seen from the correlogram citric.acid and ficed.acidity are hightly correlated with a Pearson correlation coefficient of 0.7. The positiv correlation shows, that with increasing citric.acid the ficed.acidity rises, too. This can also be seen in this scatter plot. The red dotted line visualizes the regression line for the data.


As seen from the correlogram fixed.acidity and density are hightly correlated with a Pearson correlation coefficient of 0.7. The positiv correlation shows, that with increasing fixed.acidity the density rises, too. This can also be seen in this scatter plot. The red dotted line visualizes the regression line for the data.


As seen from the correlogram fixed.acidity and pH are hightly correlated with a Pearson correlation coefficient of -0.7. The negative correlation shows, that with decreasing fixed.acidity the pH decreases, too. This can also be seen in this scatter plot. The red dotted line visualizes the regression line for the data.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the analysis.

The correlation analysis validates a (strong) relationship beween:
* quality <> alcohol
* quality <> volatile.acidity
* quality <> sulphates

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

The correlation analysis validates strong positive relationship beween:
* total.sulfur.dioxide <> free.sulfur.dioxide
* residual.sugar <> density
* sulphates <> chlorides
* citric.acid <> fixed.acidity
* fixed.acidity <> density

Furthermore these is a strong negative relationship beween:
* pH <> fixed.acidity
* pH <> citric.acid
* volatile.acidity <> citric.acid
* alcohol <> density

What was the strongest relationship you found?

The strongest relationships are between:
* total.sulfur.dioxide <> free.sulfur.dioxide
* citric.acid <> fixed.acidity
* fixed.acidity <> density
* pH <> fixed.acidity

Multivariate Plots Section

Scatter plots of the strongest correlations


This scatterplot visaualizes the relationship between free.sulfur.dioxide and total.sulfur.dioxide. The quality is visualized by color saturation. We are not able to detect a clear relationaship with quality ratings here.


This scatter plot visualizes the relationship of citric.acid and fixed.acidity. The seems to be a linear relationship between citric.acid and fixed.acidity: The higher the concentration of citric.acid, the higher the fixed.acidity. Better wines seem to have more citric.acid resp. fixed.acidity.


As for the previous plot we can also detect a linear relationship of fixed.acidity and density. The higher the fixed.acidity, the higher the density. We are not able to detect a clear relationship with the experts' quality ratings as good ratings can be found with small fixed.acidity / density values and with relatively high ones, too.


This scatterplot visualizes the relationship between fixed.acidity and pH (which is negatively correlated). Although the relationship between these values is linear we cannot detect and relationship with the quality ratings as good ratings can be found with small fixed.acidity / pH values and with relatively high ones, too.


This scatterplot visualizes the relationship between alcohol and suphates. The color saturation shows that good wines tend to have higher alcohol and higher sulphates concentration.


This scatterplot visualizes the relationship of alcohol and volatile.acidity on experts' quality ratings. Higher alcohol concentration tends results in better ratings. Higher concentration of volatile.acidity also results in better ratings but the relationship doesn’t seem that stromng as for alcohol.


This scatterplot visualizes the relationship between alcohol and citric.acid. Based on the data we can say that higher concentration of alcohol and higher concentration of citric.acid results in better experts' quality ratings.

## 
## Call:
## lm(formula = quality ~ fixed.acidity + volatile.acidity + citric.acid + 
##     residual.sugar + chlorides + free.sulfur.dioxide + total.sulfur.dioxide + 
##     density + pH + sulphates + alcohol, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.68911 -0.36652 -0.04699  0.45202  2.02498 
## 
## Coefficients:
##                         Estimate  Std. Error t value             Pr(>|t|)
## (Intercept)           21.9652084  21.1945750   1.036               0.3002
## fixed.acidity          0.0249906   0.0259485   0.963               0.3357
## volatile.acidity      -1.0835903   0.1211013  -8.948 < 0.0000000000000002
## citric.acid           -0.1825639   0.1471762  -1.240               0.2150
## residual.sugar         0.0163313   0.0150021   1.089               0.2765
## chlorides             -1.8742252   0.4192832  -4.470  0.00000837395338361
## free.sulfur.dioxide    0.0043613   0.0021713   2.009               0.0447
## total.sulfur.dioxide  -0.0032646   0.0007287  -4.480  0.00000800460981846
## density              -17.8811638  21.6330999  -0.827               0.4086
## pH                    -0.4136531   0.1915974  -2.159               0.0310
## sulphates              0.9163344   0.1143375   8.014  0.00000000000000213
## alcohol                0.2761977   0.0264836  10.429 < 0.0000000000000002
##                         
## (Intercept)             
## fixed.acidity           
## volatile.acidity     ***
## citric.acid             
## residual.sugar          
## chlorides            ***
## free.sulfur.dioxide  *  
## total.sulfur.dioxide ***
## density                 
## pH                   *  
## sulphates            ***
## alcohol              ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.648 on 1587 degrees of freedom
## Multiple R-squared:  0.3606, Adjusted R-squared:  0.3561 
## F-statistic: 81.35 on 11 and 1587 DF,  p-value: < 0.00000000000000022

From the linear regression model we can say that volatile.acidity, chlorides, total.sulfur.dioxide, sulphates and alcohol are the most important features for predicting the experts' quality ratings.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

According to the Pearson correlation analysis the strongest correlations with quality could be found for alcohol, volatile.acidity and sulphates. While alcohol had the strongest relationship with quality, in addtion with sulphates and citric.acid the quality results results seem even better.

Were there any interesting or surprising interactions between features?

The influence of the combination of citric acid and alcohol on experts's quality rating was surprising to me.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

I trained a linear regression model. The most important features for the regression are:
* chlorides
* total.sulfur.dioxide
* sulphates
* alcohol
All of the features already showed significance during facet grid examination. ——

Final Plots and Summary

Plot One

Description One

The first chart visualizes the relationship of the alcohol concentration in red wine and the experts's quality rating of the wine. Although there is a gap for the experts's rating of 5 a clear linear relationship can be observed.

Plot Two

Description Two

This scatterplot visualizes the relationship between alcohol and suphates. The color saturation shows that good wines tend to have higher alcohol and higher sulphates concentration.

Plot Three

Description Three

This scatterplot visualizes the relationship between alcohol and citric.acid. Based on the data we can say that higher concentration of alcohol and higher concentration of citric.acid results in better experts' quality ratings.


Reflection

In the first step I inspected the histograms to get a first idea of the data. It was not possible to draw conclusions about which ingredients are responsible for good ratings. Breaking down the data using facet grid exploration helped to gain these insights. It became clear which features seem to influence the ratings in a positive or negative way. To find correlations between the features itself and the the experts's rating a Pearson analysis has been made. Alcohol showed the biggest direct influence on the quality ratings. The fact was suprising that alcohol was the only feature that had a big influence on the experts' ratings. It was difficult to detect supporting features just by looking at the scatter plots. Therefore a Linear Regression Model has been calculated to help finding the supporting features. With the of the most important features for the linear regression model the final plots have been prepared and selected.

Appendix

Attribute information

Input variables (based on physicochemical tests):

1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)

Output variable (based on sensory data):

12 - quality (score between 0 and 10)

Description of attributes:

1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines
4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
5 - chlorides: the amount of salt in the wine
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
11 - alcohol: the percent alcohol content of the wine

Output variable (based on sensory data):
12 - quality (score between 0 and 10)